An Information-Theoretic Optimality Principle for Deep Reinforcement Learning
نویسندگان
چکیده
We methodologically address the problem of Qvalue overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari games, our algorithm outperforms other algorithms (e.g. deep and double deep Q-networks) in terms of both game-play performance and sample complexity.
منابع مشابه
Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach
This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...
متن کاملLearning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening
We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the ch...
متن کاملOptimality Theoretic Account of Acquisition of Consonant Clusters of English Syllables by Persian EFL Learners*
This study accounts for the acquisition of the consonant clusters of English syllable structures both in onset and coda positions by Persian EFL learners. Persian syllable structure is "CV(CC)", composed of one consonant at the initial position and two optional consonants at the final position; whereas English syllable structure is "(CCC)V(CCCC)". Therefore, Persian EFL learners need to resolve...
متن کاملReinforcement Learning for Control
Reinforcement learning (RL) offers a principled way to control nonlinear stochastic systems with partly or even fully unknown dynamics. Recent advances in areas such as deep learning and adaptive dynamic programming (ADP) have led to significant inroads in applications from robotics, automotive systems, smart grids, game playing, traffic control, etc. This open track provides a forum of interac...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1708.01867 شماره
صفحات -
تاریخ انتشار 2017